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ABSTRACT 


With the development of personalized learning in techno- 
logical platforms, more data and information are given to 
instructors on what contents are appropriate for a learner’s 
next step, with an aim of helping them support their stu- 
dents in navigating an optimized learning path that can 
promote an enhanced learning outcome. In this study, we 
collected data from an online learning platform, Learnta® 
TAD , which allows teachers to distribute tasks based on sys- 
tem recommendations. The recommendations are directed 
by the system’s knowledge graph algorithm, determining 
whether the student is ready to learn the task (i.e. the 
task is within the student’s Zone of Proximal Development), 
whether the student is not yet ready to learn the task, or 
whether the student has already mastered the task. We used 
the acquired data to investigate whether giving content in 
each of these groups results in different learning outcomes. 
Statistical methods such as subgroup analysis, Fisher’s ex- 
act test, and logistic regression are conducted to address the 
proposed topic. Replicating a prior, smaller-scale study, our 
findings suggest that the student gains more mastery when 
assigned Ready-to-Learn tasks than when assigned Unready- 
to-Learn tasks, across Math and English, more and less suc- 
cessful students, and in-class and homework. Moreover, stu- 
dents who are given already mastered tasks perform better 
than those who are given Ready-to-Learn and Unready-to- 
Learn tasks across all groups. 
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1. INTRODUCTION 


Increasingly, teachers’ decisions are driven by data [3], with 
increasing data becoming available from online learning en- 
vironments [9]. Using reports from online learning systems, 
educators are able to track and evaluate each student’s learn- 
ing based on data [1]. However, even though data are given 
to teachers by these systems, instructors are still impeded 
by having insufficient knowledge about how to use the data 
[7]. In other words, teachers still have difficulties in using 
data effectively to decide what students need to learn next, 
to maximize learning outcomes and expedite the learning 
process. 


This problem is exacerbated in online learning systems that 
give relatively more agency to teachers in choosing which 
content their students will work with. Although such sys- 
tems are easier to integrate with existing pedagogical prac- 
tices, they raise questions as to whether teachers will assign 
the best possible content. We can consider this decision in 
terms of whether a teacher selects content that falls within 
a learner’s zone of proximal development (ZPD) [8]. A task 
within a learner’s ZPD is one that he or she can succeed in, 
but only with external support or scaffolding. Tasks that 
a learner can succeed in without support, and tasks that 
a learner cannot succeed in even with support, fall outside 
of the learner’s ZPD. Although the ZPD has been a pop- 
ular concept in the educational literature for decades, only 
limited attention has been paid to ZPD in educational data 
mining and related communities [4]. 


However, recent research has found evidence that Vygotsky’s 
concept of the ZPD can be beneficial to the design of adap- 
tive learning systems [10]. In that work, Zou and colleagues 
investigated whether teachers make good instructional deci- 
sions based on student performance data. They compared 
”Ready-to-Learn” (RtL) content inside the ZPD to content 
that students were ”Unready-to-Learn” (UtL), using auto- 
mated assessments of student progress through a curriculum 
based on a knowledge graph. 


We replicate and build on this work with a larger student 
sample, assessing whether a task is RtL for a specific student 
using the prerequisite structure within a knowledge graph. 
Our hypothesis is that, like in [10], students will gain more 
mastery (successfully complete more objectives within the 
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system) if they are assigned RtL tasks instead of UtL tasks. 
We also investigate whether the findings in [10] are robust to 
whether the student is completing tasks as homework as op- 
posed to in class. We hypothesize that students working on 
content in class will gain more mastery than students com- 
pleting tasks as homework, due to the availability of greater 
learning support and scaffolding in an in-class context [6]. 
We also investigate whether the findings in [10] are robust to 
the general level of success of the student. If some students 
are simply faster or better learners than others in a domain 
(e.g. [2]), then they may be able to perform better even 
when given UtL. However, one could also argue that if the 
knowledge graph is correct, then all students should have 
similar (poorer) outcomes for UtL content, since regardless 
of their general ability they lack the building blocks to ac- 
quire the content they are given. Finally, we investigate 
whether the results in [10] are robust across two different 
learning subjects, English and Mathematics. 


2. THE ONLINE LEARNING PLATFORM 
The system used in this study is a learning platform for 
K-12 students in China, called Learnta® TAD, developed 
by Learnta Inc.. Learnta® TAD, an acronym of “Teacher + 
Artificial Intelligence + Data”, is a system which gives teach- 
ers data on student learning progress and makes recommen- 
dations on optimal learning path using AI algorithms, and 
then allows teachers to decide which content students should 
work on. Learnta® TAD is primarily used in blended learn- 
ing, where teachers give students face-to-face instructions in 
classroom. 


Intelligent Teaching 


we ALLL 


( 


Aa aA 
Learning Analytics Customized Contents 


Figure 1: Teacher’s Interface of Learnta ® TAD sys- 
tem 


In TAD, teachers assign learning tasks that contain several 
target skills to the students. The system infers each stu- 
dent’s mastery of each skill using Bayesian Knowledge Trac- 
ing (BKT) (Corbett & Anderson, 1995) by predicting the 
student’s latent knowledge state according to the student’s 
correctness on questions related to the skills. Learnta’s di- 
rected knowledge graph maps content to a prerequisite struc- 
ture, representing which prerequisite content is necessary to 
know to learn a particular piece of content. Based on the 
mastery of the student and the prerequisite structure of each 


skill, Learnta® TAD recommends RtL contents for teachers 
to instruct. More specifically, content is considered RtL if 
the student has mastered all the prerequisites of that skill; 
UtL indicates that the student is missing one or more of a 
skill’s prerequisites. Whether or not the teachers choose to 
follow the recommendations, the system collects data on the 
students’ performance and learning outcomes. Teachers can 
assign material that is RtL, UtL, or even Already Mastered 
(AM). 


Figure 2: Teacher using Learnta ® TAD in class- 
room 


3. DATA COLLECTION 


To investigate our research questions around ZPD status and 
students’ learning outcomes, we collected data from 7913 
middle school and elementary school students who studied 
250,783 task cards (one task card contains several skills) in 
Learnta® TAD, during 2019. 


In the context of both English and Math, we categorized stu- 
dents into different levels based on their earlier assessment 
test performance: 1) Excellent students; 2) Normal students; 
3) Struggling students. Excellent students are those who 
mastered at least 80% of the skills in the assessment, ac- 
cording to Bayesian Knowledge Tracing. Normal students 
are those who mastered at least 60% but less than 80% of 
the skills in the assessment. Struggling students are those 
who mastered less than 60% of the skills in the assessment. 
The proportion of these three student categories is 32.39%, 
53.78% and 13.83%, respectively. 


In addition to that, we compare the use of the system in a 
classroom setting to its use as homework. In-class, students 
complete the assigned tasks under the supervision of their 
teachers during a class session. Within the homework con- 
text, students are expected to complete their tasks at home. 
The percentage of these two scenarios are 57.5 % and 42.5 
%, respectively. 


4. STATISTICAL ANALYSIS 


We compare the learning outcomes of teachers’ decisions of 
what skills the student should work on. The analyses are 
conducted on two topics - Math and English - separately. 
The outcome of interest is whether the student mastered 
the skill according to BKT. The percentage of skills that are 
mastered are tabulated for each type of teaching decisions: 
RtL, UtL, and AM. 
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In addition to descriptive statistics, we conduct Fisher’s ex- 
act test to assess the association between instructional deci- 
sions and student mastery. Our hypothesis is that students 
are more likely to master RtL skills than UtL skills. 


In addition, a logistic regression model is used, with learning 
outcome as the independent variable, and teacher’s decision, 
student’s level, and whether learning occurs in a classroom 
as predictors. 


P values are calculated in R version 3.6.3 using the fisher.test() 
function for Fisher’s exact test and the glm() function for 
logistic regression. 


5. RESULTS 


For the tasks in Math, the completion rates were 76.5%, 
70%, and 65%, respectively, for the excellent, normal and 
struggling students. The completion rates were 79%, 72%, 
and 66.5% in English. Those findings indicate that the stu- 
dents’ completion rates vary depending on overall student 
success, x?(df = 2, N = 93874) = 650.29, p < 0.001 for 
Math and y?(df = 2, N = 146127) = 1465.87, p < 0.001 for 
English. 


The completion rates for in-class tasks were 75.7% for Math 
and 74.7% for English, and for homework tasks the comple- 
tion rate were 65.3% for Math, and 70.3% for English (see 
Figures 3 and 4). Fisher’s exact tests show the in-class tasks 
were more likely to be completed than the homework tasks 
for both Math (p < 0.001) and English (p < 0.001). 
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Figure 3: Completion Rate in Math 
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Figure 4: Completion Rate in English 


The mastery rates by subject and student success level are 
presented in Figures 5 and 6. We conducted the Fisher’s 


exact tests and it demonstrated that the excellent students 
had a better performance in terms of mastery rates com- 
pared to the normal students (Math, 68.6% vs. 52.1%, p < 
0.001; English, 63.6% vs.54.7%, p < 0.001). The mastery 
rates were much lower for the struggling students (Math, 
36.9%, p < 0.001; English, 11.9%, p < 0.001). 
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Figure 5: Mastery in Math subject 
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Figure 6: Mastery in English subject 


Figures 7 and 8 show that the average mastery rate of RtL 
tasks was significantly higher than that of UtL tasks, p < 
0.001 for each of the three student success levels in each 
subject, using Fisher’s exact test. 


The logistic regression provided further evidence that ZPD 
status was associated with students’ learning outcome (F(2, 
38891) = 119.85, p < 0.001 for Math and F(2, 1996) = 30.74, 
p < 0.001 for English), with adjustment for task type and 
student success levels. In particular, a RtL task was more 
likely to be mastered than a UtL task (Math, OR = 1.710, p 
< 0.001; English, OR = 7.709, p < 0.001), but was less likely 
to be mastered than an AM task (OR = 0.241, p < 0.001 for 
Math and OR = 0.185, p < 0.001 for English). Moreover, the 
logistic regression also suggested that students were more 
likely to master a math skill in class compared to homework 
(t(38891) = 2.676, p = 0.007), while the mastery rates of 
English skills were similar between the two settings (t(1996) 
= 0.706, p = 0.480). 


Moreover, interaction terms were added to the logistic re- 
gression model in order to test the hypothesis that the re- 
lationship between ZPD status and learning outcome was 
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Figure 7: ZPD v.s Mastery in Math subject 
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Figure 8: ZPD v.s Mastery in English subject 


different for students with various success levels (i.e., ex- 
cellent, normal, struggling), It turned out that, within the 
subject Math, the improvement on learning outcome asso- 
ciated with RtL status were comparable among the three 
student groups (F(2, 30664) = 4.374, p = 0.126), which was 
consistent with the observation that three lines correspond- 
ing to different success levels are almost parallel in Figure 7. 
Within the English subject, however, the analysis results in- 
dicated an interaction effect between RtL status and student 
success levels (F(2, 1587) = 8.763, p < 0.001): the excellent 
students tended to benefit less from being assigned a RtL 
task instead of a UtL task than either normal students (p 
< 0.001) or struggling students (p = 0.002). The conclusion 
with respect to AM tasks was less clear because there were 
fewer struggling students to begin with. 


Lastly, we did not find statistical evidence for there being 
an interaction effect between ZPD status and whether the 
system was used in class or as homework (t(38891)= -0.282, 
p = 0.778 for Math and t(1996) = 1.859, p = 0.063 for En- 
glish). This suggests that it is likely important to assign RtL 
content to students regardless of which setting the system 
is used in, although it may be warranted to continue inves- 
tigating whether RtL content has more benefit for students 
studying English in class, based on the marginally significant 


p value in that analysis. 


6. DISCUSSION & CONCLUSION 


In the light of these results, we can re-consider our origi- 
nal research questions. We hypothesized that, as in [10], 
students would master more tasks if presented with content 
thought to be in their ZPD (Ready-to-Learn content) than 
content outside of their current ZPD (Unready-to-Learn con- 
tent). Our findings are compatible with this hypothesis, pro- 
viding a replication of the earlier work in [10]. We also find 
that this pattern replicates across two domains, Math and 
English. 


Our second hypothesis was that students would have higher 
mastery rates in class than when completing homework; this 
hypothesis was upheld for math subject but not upheld for 
English subject. Our finding is that students were slightly 
more likely to master a math skill in class than as a home- 
work, while the mastery rates of English skills was compara- 
ble between the two contexts. This finding may suggest that 
the learning support within the platform was more effective 
than anticipated; alternatively, it may be that the instruc- 
tors using the platform in their classes have not yet learned 
effective pedagogies for teaching students using this type 
of technology. Effective teaching in these contexts involves 
different pedagogies than are necessary within traditional 
classrooms [6], and there is increasing evidence that many 
teachers do not adopt these pedagogies until their second 
year of teaching with a new technology [5]. 


Our third research question asked whether generally more 
successful students would perform better than other stu- 
dents, even for content seemingly outside their zone of proxi- 
mal development. In line with past work by Liu and Koedinger 
(2015) [2], it seemed that these more successful students 
were more able to succeed, even on this content that was 
anticipated to be highly difficult. However, they still per- 
formed more poorly on this content than on content thought 
to be in their ZPD. 


Overall, these results suggest that assigning content with 
regards to a student’s zone of proximal development can 
lead to a higher probability of the student mastering the 
content they are given. This result, a replication of [10], 
appears to hold in more than one learning domain. However, 
there are several important areas of future work before this 
finding can truly be held to be robust. First, this finding 
should be replicated in a broader range of contexts — other 
learning systems, other learning domains, and a wider range 
of learner populations and countries. Second, it is probably 
warranted to look at other definitions of the ZPD to refine 
this finding — is there an optimal degree of prior mastery for 
assignment of a student within the knowledge graph? Would 
alternate definitions of ZPD, such as seen in Murray and 
Arroyo’s work (2002)[4], be equally or more effective? Does 
this type of finding also hold within systems where content 
is not consolidated into skills but is more factual in nature? 
By learning the answer to these questions, we can improve 
the effectiveness of adaptive learning systems more broadly, 
while helping to better operationalize and understand one of 
the classic theories in the history of thought on education. 
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